1 Task 1

We need to find which airport among Dallas/Fort Worth International Airport DFW and Angeles International Airport LAX is more efficient to be used as the primary airport to fly in and out of USA.

1.1 About the data

  • Overview
  • This is an open data. There is no obvious license, but it falls under the US open data policy guidelines
  • The main data extracted from the above mentioned has been split into 2 files:
    • 1st .csv containing all the flights who have the Departure airport as DFW or LAX. ata extracted from the above mentioned hhas been split into 2 files:
    • 2nd .csv containing all the flights whho have the Arrival airport as DFW or LAX.Below is its preview:
    • Therefore, I definitely believe the above subsetted files will help me in finding the solution to my task.

1.2 Comparison of US airports goes here

1.2.1 Analysis for LAX and DFW airports when considered as a Departure airport

Table 1.1: Summary of DFW and LAX airport as a Departure airport
ORIGIN Median_Dep_Delay Min_Dep_Delay Max_Dep_Delay
DFW -2 -20 1193
LAX -3 -26 1495
## `summarise()` has grouped output by 'ORIGIN'. You can override using the `.groups` argument.

Figure 1.1: Average Departure Delay of DFW and LAX over the period of time

1.2.2 Analysis for LAX and DFW airports when considered as a Arrival airport

Table 1.2: Summary of DFW and LAX airport as a Arrival airport
DEST Median_Arr_Delay Min_Arr_Delay Max_Arr_Delay
DFW -6 -80 2536
LAX -12 -96 1509
## `summarise()` has grouped output by 'DEST'. You can override using the `.groups` argument.

Figure 1.2: Average Arrival Delay of DFW and LAX over the period of time

1.3 Calculations used

  • I am using departure and arrival delay as the prime factor to calculate the efficiency of the airports.
  • In both Table 1.1 and Table 1.2:-
    • The data has been grouped by Origin and Destination respectively.
    • Then the median,maximum and minimum departure delay is calculated using the summarise function.
    • The kable is used to view the output in table form and kable_styling is used to make the table more attractive.
  • In both Figure 1.1 and Figure 1.2:-
    • The Fl_date attribute datatype was converted to datetime so that it can be used in the plot.
    • Then the data was grouped by either “Origin & FL_Date” or “Dest & Fl_date” dependinig upon the dataset we are working on.
    • Then avg delay is calculated using the formula “(min_delay + max_delay)/2”, where delay is either arrival or departure.
    • Finaly a line graph is plotted of Fl_date vs avg_delay using ggplot and geom_line.
    • Additional functions such as geom_point and ggplotly are only used to make the graph more interactive.

1.4 Summary/Conclusion

1.4.1 When the airports are considered for departure

  • In Table 1.1, we can clearly notice the following:
    • Min delay of DFW is -20 which is better than LAX as it means flight left 20min earlier than actual depart time.
    • Max delay of DFW is 1193 which is better than LAX as it means flight left 1193 min after the actual depart time.
    • The Median delay figure of both the aiports also suggest that DFW is better than LAX when considered as the departure airport as less flights departed before the scheduled time/.
  • In Figure 1.1, we can clearly notice the following:
    • DFW experiences max avg delay of 591min on 11th jan, and the min avg_delay of 84min on both 5th and 7th january.
    • LAX experiences max avg delay of 739.5min on 12th jan, and the min avg_delay of 66.5min on 6th january.

1.4.2 When the airports are considered for Arrival

  • In Table 1.2, we can clearly notice the following:
    • Min delay of LAX is -96 which is better than LAX as it means flight arrived 96min earlier than actual arrival time.
    • Max delay of LAX is 1509 which is better than LAX as it means flight arrived 1509 min after the actual depart time.
    • The Median delay figure of both the aiports also suggest that LAX is better than DFW when considered as the arrival airport, as there are more flights arriving before the scheduled time.
  • In Figure 1.2, we can clearly notice the following:
    • DFW experiences max avg delay of 1233.5min on 21st jan, and the min avg_delay of 124min on 25th january.
    • LAX experiences max avg delay of 734.5min on 17th jan, and the min avg_delay of 22min on 23rd january.

Hence, we can conclude LAX is more efficient airport because:

  • There is minor difference between the flights that departed before the schedule time on both DFW or LAX, therefore we can ignore this attribute.
  • Even though departure delay of LAX iis pretty much higher, but at same time they have the best arrival time and even have most of the flights arriving before the scheduled time.
  • As a customer I believe it is ok to be delayed for departure until an unless you are reaching before/on-time.

2 Task 2

We need to assess two locations in Victoria, Tullamarine and Bendigo, for a new plane storage facility that has the least impact on local endangered species.

2.1 About the data

  • Overview
    • The data used for this task has been extracted from Atlas of Living Australia occurrence.
    • The species being considered are Anthochaera (Xanthomyza) phrygia,Thinornis cucullatus,Perameles gunnii,Petauroides volans,Petrogale penicillata,Neophema (Neonanodes) chrysogaster,Ornithorhynchus anatinus.
    • In addition, we needed to filter the data with 50Km radius from above the two mentioned locations, therefore we mutated 2 extra columns in the .csv using Excel with coordinates of each airport location.
    • These 4 newly genereated columns were used to calculate the distance of species from the locations usinng the formula “=6371ACOS(COS(RADIANS(90-lat_loc1))COS(RADIANS(90-lat_loc2))+SIN(RADIANS(90-lat_loc1))SIN(RADIANS(90-lat_loc2))COS(RADIANS(long_loc1-long_loc1)))”
    • The above formula was found here
  • This is observational data. It is provided on a voluntary basis, some by researchers, some by citizen scientists.

2.2 Comparison of domestic storage sites goes here

Table 2.1: No. of Endagered speciies in 50Km Radius of Tullamarine
scientificName count
Ornithorhynchus anatinus 283
Petauroides volans 22
Perameles gunnii 10
Ornithorhynchus anatinus (Platypus) 5
Perameles gunnii Gray, 1838 2
Petrogale penicillata 2
Thinornis cucullatus (Vieillot, 1818) 2
Ornithorhynchus anatinus (Shaw, 1799) 1
Petauroides volans (Greater Glider) 1

Figure 2.1: Spatial representian of Speciees in 50Km Radius of Tullamarine

Figure 2.2: Scatter Plot of Speciees in 50Km Radius of Tullamarine

Table 2.2: No. of Endagered speciies in 50Km Radius of Bendigo
scientificName count
Ornithorhynchus anatinus 131

Figure 2.3: Spatial representian of Speciees in 50Km Radius of Bendigo

Figure 2.4: Scatter Plot of Speciees in 50Km Radius of Bendigo

2.3 Calculations Used

  • In both Table 2.1 and Table 2.2, I have calculated the count of each species using the group_by,summarised and then arranged them in descending order.
  • In Both Figure 2.1 and Figure 2.3,leaflet has been used to create a map,addTiles to create the outlines of the map, AddCircles to show the spatial location of species in circle shape.
  • In both Figure 2.2 and Figure 2.4,functions ggplot(),xlab(),ylab(),geom_point(),ggplotly()have been used to create the interactive scatter plot.

2.4 Summary/Conclusion

  • Table 2.1 and Table 2.2, tells us that 9 species are available near Tullamarine and only 1 species is avaiable near Bendigo and that too in very less amount.
  • Figure 2.1 and Figure 2.3, tells us where exacly the species can be seen geographically.
  • Figure 2.2 and Figure 2.4, tell us not only the locations but also how far each species is from the Tullamarine/Bendigo.

Hence, we can conclude that Bendigo is more suitable for a new plane storage facility as it has the least impact on local endangered species.

3 Resources